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This study demonstrates how the incremental 4D-Var 
data assimilation method can be applied efficiently precon- 
ditioned in an application to an oceanographic problem. 
The approach consists in performing a few iterations of the 
reduced-order 4D-Var prior to the incremental 4D-Var in 
the full space in order to achieve faster convergence. An 
application performed in the tropical Pacific Ocean, with 
assimilation of TAO temperature data, shows the method 
to be both feasible and efficient. It allows the global cost 
of the assimilation to be reduced by a factor of 2 without 
affecting the quality of the solution. 



1. Introduction 

Computational requirements remain a major limiting fac- 
tor for operational forecasting of atmospheric and oceanic 
circulations. In such systems, most of the computation re- 
sources are generally devoted to data assimilation: typically, 
sequential data assimilation may cost one order of magni- 
tude more than a model simulation, and variational data 
assimilation two orders of magnitude more. Even allowing 
for the evolution of computer technology, such constraints 
seem likely to remain for many years to come since numer- 
ous scientific studies have shown that an increase in model 
resolutions is needed per se. Within this context, the aim of 
this letter is to report on how a reduced-order approach to 
variational data assimilation can help decrease the compu- 
tational cost of this method. 

2. Full space and reduced order Incremental 
4D-Var 

The usual method employed for variational assimilation 
in current meteorological and oceanographical applications 
is the incremental 4D-Var [Courtier et al, 1994]. This 
method aims at determining an optimal correction (5x, de- 
fined in the full space, to a first approximation x** of the 
initial condition, which minimizes a functional 

J(<Sx) = J6(5x) -I- J„((5x) (1) 
= i(5x)^B''5x 

N 

+^ ^(HiMi5x-di)'^Rr'(HiMi5x-di) (2) 



where Mi is the tangent linear model between the initial 
time to and time ti, and Hi is the observation operator, lin- 
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earized at time ti. The innovation vector, di, is the differ- 
ence between the observation vector and its model equiv- 
alent at time ti : di = y; — HiMiXb. 

The background error covariance matrix B is generally 
rather poorly known, and its definition presents a difficult 
problem. This lack of suitable definition impacts strongly 
on the quality of the solution and has been the subject of 
numerous research studies. In the usual incremental 4D- 
Var approach, B may be defined analytically, using, for ex- 
ample, monovariate gaussian-like covariances, and balance 
equations for multivariate covariances. 

In a reduced-order approach (hereafter denoted R-4D- 
Var), B and (5x are specified in a low-dimension subspace 
which contains a large part of the natural variability of the 
system. An efficient way to build such a subspace is to 
define it as the span of a few EOFs, Li (i — 1, . . . ,r), com- 
puted from previous model simulations [Blayo et al, 1998; 
Durbiano, 2001; Robert et al., 2005]. 

Formally, Jo remains unchanged, and only the expressions 
for (5x and B change in Jb. The increment (5x is expanded 

r 

in the EOFs basis 5x — WiLi = Lw, and B is then natu- 

i = l 

rally represented by the low-rank matrix Br = LArL'^ with 
Ar = diag(\\, Ar) where Ai is the eigenvalue correspond- 
ing to Li (see Robert et al., 2005 for details). Thus, Br 
naturally contains 3-D multivariate covariances. Note that 
another stategy consists in reducing the model itself using a 
POD approach [Cao et al., 2005; Daescu and Navon, 2006]. 



Both full space and reduced-order 4D-Var methods have 
been applied in the context of data assimilation in a prim- 
itive equation model of the tropical Pacific Ocean. The 
model is the OPA code [Madec et at, 1998] in its so-called 
TDH configuration, with the variational data assimilation 
package OPAVAR [Weaver et al., 2003]. Numerous earlier 
studies have been conducted using this configuration with 
the incremental 4D-Var, assimilating TAO and XBT tem- 
perature profiles, and have produced good results [Weaver 
et al., 2003; Vialard et al., 2003]. In these studies, the back- 
ground error covariance matrix B is defined as an operator 
including gaussian-like covariance functions, but it remains 
monovariate [Weaver and Courtier, 2001]. Efforts to de- 
velop a multivariate operator have been made recently and 
implemented in the 3D-Var context [Ricci et al., 2005]. 

This incremental 4D-Var approach needs quite a large 
number of iterations -typically 40- to converge [Vialard et 
al., 2003, Weaver et al., 2003]. Since each iteration requires 
one run of both the tangent linear model and the adjoint 
model, the computational cost can thus become prohibitive 
for real configurations. In a non-linear situation such as the 
mid-latitude ocean, the cost will be even greater [Blum et 
al, 1998]. 
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By working only in a low-dimension space, and thereby 
optimizing a very limited number of coefficients (we have 
retained 30 EOFs vectors, explaining 92% of the total vari- 
ability), the R-4D-Var tries in particular to overcome this 
drawback. This method has proved to work well in the ide- 
alized context of twin experiments, i.e. when the model 
assimilates simulated observations and is thus supposed to 
be perfect [Robert et ai, 2005]. The computational cost is 
decreased in this case by a factor of at least 3, and the op- 
timal increment is very well identified in the space spanned 
by the EOFs. 

3. A Two-Step 4D-Var strategy 

When dealing with real data, the definition of a relevant 
EOF basis, representative of the true variability of the sys- 
tem, is much more challenging. Since the model has signif- 
icant errors, EOFs computed from model simulations with- 
out assimilation do not contain the right information and 
lead to poor R-4D-Var results. One possible way to address 
this difficulty is to compute EOFs from a previous assim- 
ilated run, with another assimilation method for example. 
However, since the system providing the EOFs remains in 
any case imperfect, the low-dimension correction subspace 
does not contain all the relevant variability of the true dy- 
namics. The key idea of this study is therefore to combine 
the quality of the identification performed by 4D-Var and 
the efficiency of R-4D-Var. This means that we continue to 
look for an optimal correction in the full space, but ad- 
dress the problem of the associated high computational cost 
by using R-4D-Var to provide a relevant initial guess for full 
space minimization. The approach consists thus in perform- 
ing a few iterations of R-4D-Var (without trying to reach a 
converged solution), and in using this current estimate of 
(5x to initialize additional iterations of 4D-Var. Only a few 
of the first iterations of R-4D-Var appear to be effective in 
providing a relevant first guess, defined in the reduced space. 
This first guess can then be corrected and improved in the 
full space by a few iterations of 4D-Vax. This technique can 
thus be seen as a preconditioning of 4D-Var by R-4D-Var, 
and will hereafter be referred to as "Two-Step 4D-Var" (TS- 
4D-Var). Note that the expression of B changes between 
R-4D-Var and 4D-Var. This means that the role of Jb is 
identical in both methods (to minimize |[(5x|[) but with dif- 
ferent norms. The other functional Jo remains unchanged. 
The cost functions being quadratic (due to the incremental 
approach), any unconstrained minimization algorithm may 
be used. We use here a BFGS-like algorithm. 

4. Assessing the Two-Step strategy 

In order to validate this TS strategy, experiments are per- 
formed in the tropical Pacific Ocean (Fig. 1), using the OPA 
model in its TDH configuration. The atmospheric forcings 
are daily ERS-TAO winds [Menkes et ai, 1998] and monthly 
ECMWF heat fiuxes. Our experiments start in January 
1993 and last one year. The dynamics during this period 
is representative of a "normal" yeax in the tropical Pacific, 
without the noteworthy influence of any ENSO event. These 
conditions are seen as favorable to test the method. 

In order to be able to compare our results with those 
of previous studies performed with the same configuration 
[Vialard et ai, 2003], we assimilate temperature data from 
the TAO/TRITON array plus XBT. 

In the TS-4D-Var, we first perform 10 iterations of the R- 
4D-Var. These are followed by a run of the fully non-linear 



direct model in order to update the reference trajectory for 
linearization. Then, 10 iterations of the incremental 4D-Var 
are performed. Thus, the cycle of the TS-4D-Var is like a 
cycle of the 4D-Var with 2 outer loops consisting of 10 inner 
loops each. 

Figure 2 shows the two successive sequences of the obser- 
vation term Jo in the cost function. We can see that the 
R-4D-Var performs a first descent (10 iterations, although 
5-6 may be sufficient) followed by a second one performed 
by the incremental 4D-Var. The second sequence allows us 
to reach a lower level very quickly and to stabilize the value 
of the cost function after very few iterations. Thus, the 
minimization phase requires a lower number of iterations to 
reach the minimum (less than 20, whereas 40 are required 
with the incremental 4D-Var), due to the fact that we retain 
only the 30 largest EOF vectors. 

For both assimilated and non-assimilated variables, the 
TS-4D-Var provides results that are definitely comparable 
to those obtained with the incremental 4D-Var. To illustrate 
this point. Figures 3 and 4 show a comparison of the fields 
at a particular location in the eastern part of the domain 
at (now, 0N)(scc Fig 1). This location corresponds to the 
area where the most intense non-linearities occur. These 
arc due to the Tropical Instability Waves which rise and 
propagate there, becoming increasingly intense from mid- 
June/eaxly July. This area is thus also where identification 
of the solution is the most difficult. To make successive com- 
parisons of the free run, the incremental 4D-Var and the 
TS-4D-Var with TAO data, we have plotted time-depth di- 
agrams of the absolute difference between model and TAO 
temperature and zonal velocity. With regard to tempera- 
ture (Fig. 3), the free run shows significant departures from 
TAO data. The thermal field is correctly represented, how- 
ever, by the two assimilation methods, which leads to very 
comparable results with the same low level of absolute error, 
located only in the first hundred meters. Concerning zonal 
velocity, which is a non-assimilated variable, the results arc 
also satisfactory. Both assimilation methods succeed in rep- 
resenting the inversion of the surface current, and the global 
level of the absolute error is of the same order of magnitude 
for the TS-4D-Var as for the 4D-Var, and even slightly lower 
(See Fig. 4). 

5. Conclusion 

In this letter we have proposed a new method to im- 
prove the efficiency of the 4D-Var assimilation method. Our 
Two-Step 4D-Var can be seen as a preconditioned 4D-Var, 
which greatly decreases the computational cost of assimila- 
tion. This approach is validated by assimilating real in-situ 
temperature profiles in a realistic model of the tropical Pa- 
cific Ocean. The results provided by the TS-4D-Var, for 
both assimilated and non-assimilated variables, are of simi- 
lar quality to those obtained by the 4D-Var, but the cost is 
divided by a factor of 2. For expensive configurations, this 
method would thus seem to be an attractive alternative to 
the full space 4D-Var approach. 
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Figure 1. Geographical extension of the model, location 
of the TAO/TRITON array points and location of the 
point chosen for the time-depth diagrams displayed in 
Fig. 3 and Fig. 4 (X symbol). 
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Figure 2. Cost function for the observations Jo for each 
method, as a function of time (in months). Top panel: 
4D-Var with 44 iterations for each assimilation window. 
Bottom panel: TS-4D-Var with 22 iterations for each 
assimilation window (dashed black line: R-4D-Var, black 
Une: Full 4D-Var). 
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Figure 3. Absolute difference in the temperature fields 
at (110°W, 0°N), between model and TAO data, as a 
function of time (horizontal axis) and depth (vertical 
axis). Top panel: free run. Middle panel: incremental 
4D-Var. Bottom panel: TS-4D-Var. 
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Figure 4. Absolute difference in the zonal velocity fields 
at (110°W, 0°N), between model and TAO data, as a 
function of time (horizontal axis) and depth (vertical 
axis). Top panel: free run. Middle panel: incremental 
4D-Var. Bottom panel: TS-4D-Var. 



